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Design of Optimal Sparse Feedback Gains via 
the Alternating Direction Method of Multipliers 

Fu Lin, Makan Fardad, and Mihailo R. Jovanovic 

Abstract 

We design sparse and block sparse feedback gains that minimize the variance amplification (i.e., 
the %2 norm) of distributed systems. Our approach consists of two steps. First, we identify sparsity 
patterns of feedback gains by incorporating sparsity-promoting penalty functions into the optimal control 
problem, where the added terms penalize the number of communication links in the distributed controller. 
Second, we optimize state feedback gains subject to structural constraints determined by the identified 
sparsity patterns. This polishing step improves the quadratic performance of the distributed controller. 
In the first step, we identify sparsity structure of feedback gains using the alternating direction method 
of multipliers, which is a powerful algorithm well-suited to large optimization problems. This method 
alternates between promoting the sparsity of controllers and optimizing the closed-loop performance, 
which allows us to exploit the structure of the corresponding objective functions. In particular, we take 
advantage of the separability of the sparsity-promoting penalty functions to decompose the minimization 
problem into sub-problems that can be solved analytically. Even though the H2 norm is in general a 
nonconvex function of the feedback gain, we identify classes of convex problems that arise in the design 
of sparse undirected networks. We show that the corresponding synthesis problem can be formulated as 
a semidefinite program, implying that the globally optimal sparse controller can be computed efficiently. 
Several examples are provided to illustrate the effectiveness of the developed approach. 

Index Terms 

Alternating direction method of multipliers, communication architectures, continuation methods, 
convex optimization, l\ minimization, network design, semidefinite program, separable penalty func- 
tions, sparsity-promoting optimal control, structured distributed design. 

I. Introduction 

The design of distributed controllers for interconnected systems has received considerable 
attention in recent years [1]— [17]. Research efforts have focused on two major issues, namely, 
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the design of communication architectures of the distributed controllers and the design of the 
optimal controllers under a priori specified structural constraints. 

The identification of convex classes of distributed control problems has been an active re- 
search topic in recent years. For spatially invariant systems, the design of quadratically optimal 
controllers can be cast into a convex problem if the information in the controller propagates at 
least as fast as in the plant [3], [6]. A similar but more general algebraic characterization of 
the constraint set was introduced and convexity was established under the condition of quadratic 
invariance in [7]. Since these convex formulations are expressed in terms of the impulse response 
parameters, they do not lend themselves easily to state-space characterization. In [14], a state- 
space realization of the optimal distributed controllers that satisfy cone-causality property was 
provided and methods for design of sub-optimal controllers were developed. 

Characterizing the structural properties of optimal distributed controllers is another important 
question. For spatially invariant systems, the linear quadratic controllers are also spatially invari- 
ant and the measurements from other subsystems are exponentially discounted with the distance 
between the controller and the subsystems [1]. This spatially decaying property is extended 
to systems on graphs in [8] and it motivates the search for inherently localized controllers. 
Rather than truncating optimal centralized gains, an augmented Lagrangian approach was used to 
design structured optimal % 2 feedback gains in [16]. Furthermore, a continuation-based Newton's 
method provided the optimal localized controllers for vehicular strings in [18]. 

In this paper, we develop methods for the design of sparse and block sparse feedback gains that 
minimize variance amplification of the distributed system. Our approach consists of two steps. 
The first step, which can be viewed as a structure identification step, is aimed at finding sparsity 
patterns S that strike a balance between the "H 2 performance and the sparsity of the controller. 
This is achieved by incorporating sparsity-promoting penalty functions into the optimal control 
problem, where the added sparsity-promoting terms penalize the number of communication links. 
We consider several sparsity-promoting penalty functions including the cardinality function and 
its convex relaxations. In the absence of sparsity-promoting terms, the solution to the standard 
V.2 problem results in centralized controllers with dense feedback gains. By gradually increasing 
the weight on the sparsity-promoting penalty terms, the optimal feedback gain moves along a 
parameterized solution path from the centralized to the sparse gain of interest. This weight 
is increased until the desired balance between the performance and the sparsity patterns S is 
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achieved. In the second step, we solve an optimal control problem subject to the feedback gain 
belonging to the identified structure S. This polishing step can improve the % 2 performance of 
the structured controller; see [10], [16] for related efforts. 
Major contributions of this paper are summarized next: 

• We identify classes of convex problems that arise in several emerging applications including 
sparse synthesis of undirected networks of single-integrators, coordination of vehicular 
formations, and synchronization of oscillator networks. We show that the optimal control 
problem that utilizes the weighted l x norm as the sparsity-promoting penalty function can 
be formulated as a semidefinite program (SDP), thereby implying that the globally optimal 
sparse controller can be computed efficiently. 

• We demonstrate that the alternating direction method of multipliers (ADMM) [19] provides 
an effective tool for the design of sparse distributed controllers whose performance is 
comparable to the performance of the optimal centralized controller. This method alternates 
between promoting the sparsity of the feedback gain matrix and optimizing the closed-loop 
H2 norm. The advantage of this alternating mechanism is threefold. First, it provides a 
flexible framework for incorporation of different penalty functions that promote sparsity or 
block sparsity. Second, it allows us to exploit the separability of the sparsity-promoting 
penalty functions and to decompose the corresponding optimization problems into sub- 
problems that can be solved analytically. These analytical results are immediately applicable 
to other distributed control problems where sparsity is desired. Finally, it facilitates the use 
of descent algorithms for the H2 optimization, in which a descent direction can be formed 
by solving two Lyapunov equations and one Sylvester equation. 

Our approach is motivated in part by the emerging field of compressive sensing [20]. The 
£1 norm is widely used as a proxy for cardinality minimization in applied statistics, in sparse 
signal processing, and in machine learning; see [19], [21], [22]. In the controls community, recent 
work inspired by similar ideas includes [23]-[26]. In [23], an £ induced gain was introduced to 
quantify the sparsity of the impulse response of a discrete-time system. In [24], [25], the weighted 
£1 framework was used to design structured dynamic output feedback controllers subject to a 
given "Hoc, performance. In [26], an £1 relaxation method was employed for the problem of 
adding a fixed number of edges to a consensus network. 

Our presentation is organized as follows. We formulate the sparsity-promoting optimal control 
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problem and compare several sparsity-promoting penalty functions in Section [nj We identify 
classes of convex sparse synthesis problems and establish the corresponding SDP formulation 



in Section III We present the ADMM algorithm, emphasize the separability of the penalty 
functions, and provide the analytical solutions to the sub-problems for both sparse and block 



sparse minimization problems in Section IV Several examples are provided in Sections III 
and [V] to demonstrate the effectiveness of the developed approach. We conclude the paper with 



a summary of our contributions in Section VI 



II. Sparsity-promoting Optimal Control Problem 
Consider the following control problem 



x = Ax + Bid + B 2 u 
z = C x + Du 
u = — F x 



(1) 



where d and u are the disturbance and control inputs, z is the performance output, C = 
[Q 1 / 2 0] T , and D — [0 i? 1//2 ] T , with standard assumptions that (A, B 2 ) is stabilizable and 
(A, Q 1/2 ) is detectable. The matrix F is a state feedback gain, Q = Q T > and R = R T > 
are the state and control performance weights, and the closed-loop system is given by 



x = (A — B 2 F) x + B 1 d 
-R l l 2 F 



(2) 



x. 



The design of the optimal state feedback gain F, subject to structural constraints that dictate 
its zero entries, was recently considered by the authors in [10], [16]. Let the subspace S embody 
these constraints and let us assume that there exists a stabilizing F E S. References [10], [16] 
then search for F E S that minimizes the T-L 2 norm of the transfer function from d to z, 



where 



J(F) 



minimize J(F) 
subject to F E S 

trace [BTp{F)B 1 ) 1 F stabilizing 
oo, otherwise. 



(SH2) 



(3) 
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The matrix P(F) in §3§ denotes the closed-loop observability Gramian 

POO 

P(F) = / e^ A - B ^ Tt (Q + F T RF)^ A - B ^ t dt (4) 
Jo 

which can be obtained by solving the corresponding Lyapunov equation 

{A - B 2 F) T P + P{A- B 2 F) = - (Q + F T RF) . (5) 



While the communication architecture of the controller in (SH2) is a priori specified, in this 
paper our emphasis shifts to identifying favorable communication structures without any prior 
assumptions on the sparsity patterns of the matrix F. We propose an optimization framework in 
which the sparsity of the feedback gain is directly incorporated into the objective function. 

Consider the following optimization problem 

minimize J(F) + 7 go(F) (6) 

where 

g (F) = card(F) (7) 

denotes the cardinality function, i.e., the number of nonzero elements of a matrix. Note that, 



in contrast to problem (SH2) no structural constraint is imposed on F; instead, our goal is 
to promote sparsity of the feedback gain by incorporating the cardinality function into the 
optimization problem. The positive scalar 7 characterizes our emphasis on the sparsity of F; a 
larger 7 encourages a sparser F, while 7 = renders a centralized gain that is the solution of 
the standard LQR problem. For 7 = 0, the solution to ^ is given by F c = R~ 1 B 2 r P, where P 
is the unique positive definite solution of the algebraic Riccati equation 

A T P + PA + Q - PB 2 R- l BlP = 0. (8) 

A. Sparsity-promoting penalty functions 

Problem ([6]) is a combinatorial optimization problem whose solution usually requires an in- 
tractable combinatorial search. In optimization problems where sparsity is desired, the cardinality 
function is typically replaced by the i\ norm of the optimization variable [27, Chapter 6], 

9i(F) = \\F\U, = (9) 
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Recently, a weighted l\ norm was used to enhance sparsity in signal recovery [22], 

92(F) = J^W^l (10) 

where Wij £ K are positive weights. If Wi/s are chosen to be inversely proportional to the 
magnitude of F^, i.e., {Wy = l/\F i:j \, F tj ^ 0; = l/e, F^ = 0, < e < 1}, then the 
weighted i\ norm and the cardinality function of F coincide, £\ . \Fij\ = card (F) . 

The above scheme for the weights, however, cannot be implemented, since the weights depend 
on the unknown feedback gain. A reweighted algorithm that solves a sequence of weighted t\ op- 
timization problems in which the weights are determined by the solution of the weighted t\ prob- 
lem in the previous iteration was proposed in [22], [28]. This reweighted scheme was recently em- 
ployed by the authors to design sparse feedback gains for a class of distributed systems [29], [30]. 

Both the l\ norm and its weighted version are convex relaxations of the cardinality function. 
On the other hand, we also examine utility of the nonconvex sum-of-logs function as a more 
aggressive means for promoting sparsity [22] 

g 3 (F) = J^log M + , < e < 1. (11) 

Remark 1: Design of feedback gains that have block sparse structure can be achieved by 
promoting sparsity at the level of the submatrices instead of at the level of the individual elements. 
Let the feedback gain F be partitioned into submatrices F^- £ ]R m * xn i that need not have the 
same size. The weighted l\ norm and the sum-of-logs can be generalized to matrix blocks by 



replacing the absolute value of F^ in ( 10) and ( 11 ) by the Frobenius norm || ■ \\p of Fy. Similarly, 



the cardinality function yj should be replaced by ^^card (H-P^Hf) , where H-F^ Hf does not 
promote sparsity within the F^ block; it instead promotes sparsity at the level of submatrices. 

B. Sparsity-promoting optimal control problem 

Our approach to sparsity-promoting feedback design makes use of the above discussed penalty 
functions. In order to obtain state feedback gains that strike a balance between the quadratic 
performance and the sparsity of the controller, we consider the following optimal control problem 

minimize J(F) + 7 g(F) (SP) 
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where J is the closed-loop 1-L 2 norm ([3]) and g is a sparsity-promoting penalty function, e.g., 
given by 0, ([9j), ( [T0| >, or {n}. When cardinality function in ((7]) is replaced by ((9]), ([10]), or ( [TT) , 



problem |(SP)| can be viewed as a relaxation of the combinatorial problem Q-Q, obtained by 
approximating the cardinality function with the corresponding penalty functions g. 



As the parameter 7 varies over [0, +00), the solution of (SP) traces the trade-off path between 
the H2 performance J and the feedback gain sparsity g. When 7 = 0, the solution is the 
centralized feedback gain, which can be computed from the solution of the algebraic Riccati 
equation ([8]). We then slightly increase 7 and employ an iterative algorithm - the alternating 
direction method of multipliers (ADMM) - initialized by the optimal feedback matrix at the 



previous 7. The solution of (SP)| becomes sparser as 7 increases. After a desired level of sparsity 



is achieved, we fix the sparsity structure and find the optimal structured feedback gain by solving 



the structured "H 2 problem (SH2) 



Since the set of stabilizing feedback gains is in general not convex [31] and since the 
matrix exponential is not necessarily a convex function of its argument [27], J need not be 
a convex function of F. This makes it difficult to establish convergence to the global minimum 
of |(SP)[ In Section |ni} we identify classes of convex problems that arise in several emerging 
distributed control applications including sparse synthesis of undirected networks of single- 
integrators, coordination of vehicular formations, and synchronization of oscillator networks. 
Even in problems for which we cannot establish convexity of J(F), our extensive computational 



experiments suggest that the algorithms developed in Section IV provide an effective means for 



attaining a desired trade-off between the "H 2 performance and the sparsity of the controller. 

III. Design of Undirected Single-integrator Networks: A Convex Problem 

An important question in the optimal design of networks of dynamic systems is related to 
the selection of interconnection topology, i.e., the identification of controller architectures that 
strike an optimal balance between performance and communication. The information patterns 
in the corresponding optimal control problem are typically described by a graph, with applica- 
tions ranging from coordinated control of multi-agent systems, to average consensus in sensor 
networks, to synchronization of networks of oscillators [15], [18], [29], [30], [32]-[37]. Several 
previous research efforts have focused on characterizing performance limitations in large-scale 
networks with fixed topologies [15], [18], [33], [34], [36]-[39]. Recently, a number of authors 
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have considered the problem of modifying the network topology to improve its performance [26], 
[40]-[44]. 

For undirected networks of single-integrators, we next show that sparsity-promoting feedback 



synthesis problem (SP) with g being the weighted i\ norm can be formulated as a semidefinite 
program (SDP). This implies that the globally optimal sparse controller can be computed effi- 
ciently. It is worth noting that the framework of this section also bears relevance to the design 
of undirected networks of double-integrators that are, for example, encountered in coordination 
of vehicular formations and synchronization of oscillator networks. 



A. A semidefinite program formulation for \(SP) 



For an undirected network of iV single-integrators, the closed-loop system ([2]) is determined 

by 

A = O, B 1 = B 2 = I, F = F T (12) 

where / and O are iV x N identity and zero matrices, and the Lyapunov equation ([5]) becomes 

FP + PF = Q + FRF. 

The closed-loop stability amounts to positive definiteness of the feedback gain, F y 0, and the 
objective function ([3]) can be expressed as 

J(F) = (1/2) trace (F^Q + RF). (13) 

This scenario is of interest in the design of undirected networks of single-integrator vehicles 
where it is desired to tightly regulate both the relative position errors between vehicles and the 



absolute position of the formation's center of mass [18], [29]; see Section III-B.l 

For the average consensus problem over undirected networks, however, F is a positive semidef- 
inite Laplacian matrix of a connected graph, F y 0, that satisfies Ft = and F + 11 T /N y 
with 1 denoting the vector of all ones. The state weight Q is typically selected to penalize the 
deviation from consensus; see Section III-B.2 If Q satisfies Qt = and Q + tt T /N y 0, then 



the average mode x := (1/N) t T x associated with the eigenvector 1 of F is unobservable from 
the performance outputs. Consequently, the observability Gramian P satisfies PI = 0, and it is 
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determined by the solution of the Lyapunov equation 



(F + ll T /N) P + P(F + lt T /N) = Q + FRF 

where the term tt T /N gets canceled when multiplied by P. Therefore, the objective function ^ 
can be expressed as 



J(F) = (1/2) trace ((F + tt T /N^Q + RF). 



(14) 



(15) 



Lemma 1: Suppose that Qt = and Q + tt T /N y 0. Then the optimization problem 

minimize J(F) = (1/2) trace ((F + tt T /N)- l Q + RF) 

subject to Ft = 0, F + 11 T /A^^0 
can be formulated as an SDP 

minimize (1/2) trace (X + RF) 

X, F 

X Q 1 / 2 
Ql/2 p+tt T /N 

Ft = 0. 

Proof: See Appendix [A] ■ 
We now state the main result of this section. 

Proposition 2: For system d2]>-( 12) with the objective function J given by either ( 13 ) or ( 14), 
the sparsity-promoting optimal control problem 



subject to 



y o 



(16) 



minimize J(F) + 7 \Fij\ 



(17) 



h3 



can be formulated as an SDP 



minimize (1/2) trace (X + RF) + -yt T Yt 

X,Y,F 



subject to 



X Q 1 / 2 
Q 1 ' 2 F + att T /N 
aFt = 
-Y < WoF < Y, 



y 



(18) 
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where a = and a = 1 correspond to J in ([l"3j) and (fT4b, respectively. 



Proof: We begin by transforming the nonsmooth weighted t\ norm in (17) to a linear 
function with linear inequality constraints 

minimize J(F) + 7 t T Yt 

Y,F 

subject to -Y <W o F <Y 
where Y is a matrix with nonnegative elements, Y > 0, and o is the elementwise multiplication 



of matrices. Using Schur complement, the minimization of J in (13) can be cast as an SDP 



minimize (1/2) trace (X + RF) 

X, F 



subject to 



X Q 1 / 2 
Q 1 ' 2 F 



y 0. 



(19) 



Note that F y whenever Q y in the LMI in (19). This shows the equivalence between the 
SDP (18) with a = and the minimization problem (17) with J given by (13). Lemma [T] in 
conjunction with the transformation of the weighted t\ norm to linear inequality constraints, can 



then be used to show that the SDP (18) with a = 1 is equivalent to the minimization problem 



(17) with J given by (14). This completes the proof. 



For small and medium problems (e.g., iV < 50), the global solution of ( 17 ) can be obtained 
using standard SDP solvers. For large problems, customized interior point method can be devel- 



oped to solve (17); see [42] for related problems. Alternatively, the ADMM algorithm developed 



in Section IV provides a powerful tool for large-scale optimal control problems given by (SP) 



Remark 2: The above presented framework for the design of undirected networks of single- 
integrators is also relevant for the design of undirected networks with higher order sub-systems [18], 
[30]. For an undirected network of double-integrator vehicles with uniform diagonal velocity 
gain, the design of a symmetric position gain matrix to maintain a desired spacing between the 
vehicles amounts to minimization of J given by ( fl"3| ); see [18]. For a network of LC-oscillators 
with identical oscillation frequency, the design of the conductance matrix to keep small difference 
in voltage among oscillators amounts to minimization of J given by ( p"4| ); see [30]. Thus, these 
design problems can be also addressed using the SDP-formulation presented in Proposition |2j 
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B. Examples 

To illustrate utility of Proposition [2} we next provide examples that are encountered in the 
design of sparse vehicular formation and average consensus networks. We show that these classes 



of problems can be formulated as the SDP (18). 



1 ) Leader-follower vehicular formation: Consider N single-integrator vehicles 

where U\ and di are the control and the disturbance acting on the ith vehicle. A vehicle is a 
follower if it uses only the relative information exchange with other vehicles to compute its 
control action, 



AT 



A vehicle is a leader if, in addition to the relative information exchange with other vehicles, it 
also has access to its own state 

N 

Ui ^ ^ Fij (j^i ^j) -^ii 

For a formation shown in Fig. [Tj with the 1st and the iVth vehicles being leaders and all other 
vehicles being followers, we penalize both the absolute position errors of the vehicles 

N 

x Qg x = y x i 

8=1 

and the relative position errors between the vehicles 

N-l 

x T Qix = y^( Xi - x i+ if . 

i = X 

Thus, Q g = I and Qi is a symmetric tridiagonal matrix with —1 on its first sub- and super- 
diagonal; the first and the last elements on the main diagonal of Qi are equal to 1, and all other 
elements on its main diagonal are equal to 2. We set Q = Q g + Qi and choose R = I. 

The communication graphs determined by sparsity structures of F for different values of 7 are 
illustrated in Fig.[2j For 7 = 0, we have a centralized controller in which all-to-all communication 
is required in order to form the control action; for 7 = 0.02, a star-like topology with respect to 
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xra 



N 



Fig. 1: One-dimensional vehicular formation with the 1st and the iVth vehicles being leaders 
(highlighted in red color) and all other vehicles being followers. 



1 10 1 10 




(a) 7 = (b) 7 = 0.02 (c) 7 = 0.06 



Fig. 2: For illustration purpose, we wrap the communication graph of the formation around a 
circle, (a) All-to-all communication architecture for 7 = 0; (b) leaders-to-all plus nearest neighbor 
interaction for 7 = 0.02; (c) leaders-to-some plus nearest neighbor interaction for 7 = 0.06. 



leaders is identified; and for 7 = 0.06, we obtain a controller with leaders-to-some plus nearest 
neighbor interactions. We note that the number of vehicles that communicate with the leaders 
decreases with 7, and that the nearest neighbor interaction pattern emerges for large values of 7. 

2) Undirected consensus networks: Consider an undirected connected network with N nodes 
in which each node updates its scalar state using a weighted average of differences between its 
own state and the states of its neighbors 

This problem can be viewed as either the vehicular formation problem without leaders or as a 
sensor network problem in which the sensors are trying to reach consensus. Since the leaderless 
network with only relative information exchange is not asymptotically stable, the average of all 
states x := (1/iV) l T x in the presence of stochastic disturbances undergoes a random walk [33]. 
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Following [36], we consider the local and global performance errors that render the average 
mode x unobservable. The global error quantifies the deviation of each node's state from the 
average of all states 

1 N 

\^g)i %i X X{ -r^ ^ ^ Xj . 



8=1 



In contrast, the local error quantifies the difference between the states of neighboring nodes 



,h j 



Xj for E £ 



where £ is the edge set with E £ denoting an edge between two neighboring nodes. 
For N = 50 randomly distributed nodes in a region of 10 x 10 units, let two nodes be neighbors 



if their Euclidean distance is not greater than 2 units; see Fig. 3a Thus, the performance output 
is given by 



u 



Q 



1/2 



where Qi = E E T and E is the incidence matrix of the local performance graph. 



As shown in Fig. 3b , in addition to the interactions between neighbors, the identified com- 
munication graph also establishes long-range links between selected pairs of remote nodes. 
Relative to the centralized gain F c , the identified sparse gain F uses 7% nonzero elements (i.e., 
card (F) /card (F c ) = 7%) and achieves a performance loss of only 14% (i.e., (J — J c )/J c = 



14%). The optimal gains are obtained by solving the SDP ( |T8| ) with a = 1, 7 = (for F c ), and 
7 = 1 (for F). 

IV. Identification of Sparsity-patterns via ADMM 

The alternating direction method of multipliers (ADMM) has been studied extensively since the 
1970s. This simple but powerful algorithm blends the separability of the dual decomposition with 
the superior convergence and robustness of the method of multipliers [19]. As a consequence, it 
has been employed in a wide range of applications [45]-[47]; reference [19] provides an excellent 
survey with emphasis on its application to large-scale distributed optimization problems. 

Consider the following constrained optimization problem 



minimize J(F) + 7 g(G) 
subject to F - G = 



(20) 
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(a) (b) 7 = 1 



Fig. 3: (a) Local performance graph where edges connect every pair of nodes with a distance 
not greater than 2 units, (b) Identified communication graph for 7 = 1, where the long-range 
communication links are highlighted in black color. 



which is clearly equivalent to the problem |(SP)[ The augmented Lagrangian associated with the 
constrained problem ([20]) is given by 



C P (F, G, A) = J(F) + 7 (/(GO + trace (A T (F - Gj) + ^\\F - Gf F 

where A is the dual variable (i.e., the Lagrange multiplier), p is a positive scalar, and || • ||^ 
is the Frobenius norm. By introducing an additional variable G and an additional constraint 



F — G = 0, we have simplified the problem (SP) by decoupling the objective function into two 



parts that depend on two different variables. As discussed below, this allows us to exploit the 
structures of J and g. We note that reference [19] provides many examples of such equivalent 
problem formulations suitable for the application of ADMM. 



In order to find a minimizer of the constrained problem (20), the ADMM algorithm uses a 
sequence of iterations 

F k+l : = argmin£ p (F,G fc ,A fc ) (21a) 

F 

G k+1 : = argmin£ p (F fe+1 ,G,A fc ) (21b) 

G 

A k+i ._ A k + p ( F k+i _ G k+ij (2 lc) 
until ||F fe+1 - G fe+1 || F < e and ||G fc+1 - G k \\ F < e. In contrast to the method of multipliers [19], 
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in which F and G are minimized jointly, 



fe+i /ofc+i\ 



(F fe+1 ,G 



arg min C P (F, G, A k 

f,g 



ADMM consists of an F-minimization step (21a), a G-minimization step (21b), and a dual 



variable update step (21c). Thus, the optimal F and G are determined in an alternating fashion, 



which motivates the name alternating direction. Note that the dual variable update (21c) uses a 
step-size equal to p, which guarantees that one of the dual feasibility conditions is satisfied in 
each ADMM iteration; see [19, Section 3.3]. 



ADMM brings two major benefits to the sparsity-promoting optimal control problem (SP) 



Separability of g. The penalty function g is separable with respect to the individual el- 
ements of the matrix. In contrast, the closed-loop I-L2 norm cannot be decomposed into 
componentwise functions of the feedback gain. By separating g and J in the minimization 



of the augmented Lagrangian C p , we can decompose G-minimization problem (21b) into 



sub-problems that only involve scalar variables. This allows us to determine analytically 



the solution of (21b). 



• Differentiability of J. The closed-loop H2 norm J is a differentiate function of the feedback 
gain matrix [16]; this is in sharp contrast to g which is a non-differentiable function. 
By separating g and J in the minimization of the augmented Lagrangian C p , we can 
utilize descent algorithms that rely on the differentiability of J to solve the F-minimization 
problem ( |21a| ). 

We next provide the analytical expressions for the solutions of the G-minimization prob- 
lem (21b) in Section IV-A[ describe a descent method to solve the F-minimization problem ( 21a| ) 
in Section IV-B present Newton's method to solve the structured problem [(SH2)| in Section IV-C 



and discuss the convergence of ADMM in Section IV-D 



A. Separable solution to the G -minimization problem (21b) 

The completion of squares with respect to G in the augmented Lagrangian C p can be used to 



show that (21b) is equivalent to 



minimize 0(G) = 75(G) + (p/2)||G - V k \\ 2 F 



(22) 



where V k = (l/p)A fc + F k+1 . To simplify notation, we drop the superscript k in V k throughout 
this section. Since both g and the square of the Frobenius norm can be written as a summation 
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of componentwise functions of a matrix, we can decompose (22) into sub-problems expressed 
in terms of the individual elements of G. For example, if g is the weighted t\ norm, then 

4>{G) = ^r(Mr (/ <;,, + (p/2)(G y - v l3 f). 



This facilitates the conversion of (22) to minimization problems that only involve scalar variables 
Gij. By doing so, the solution of (22) for different penalty functions can be determined analyt- 



ically including the weighted i\ norm, the sum-of-logs function, and the cardinality function. 



1) Weighted t\ norm: The unique solution to ( |22| ) is given by the soft thresholding opera- 
tor (e.g., see [19, Section 4.4.3]) 



G 



'j 



1 - 



\V*\ 



V, 



\Vij\ > a 



\Vij\ < a 



(23) 



where a = (j/p)Wij] see Fig. 4a for an illustration. For given V^, G* is obtained by moving 



Vij towards zero with the amount ('y/p)W i j. In particular, G*- is set to zero if \Vij\ < ( r y/p)Wij, 
implying that a more aggressive scheme for driving G* to zero can be obtained by increasing 
7 and Wij and by decreasing p. 



2) Cardinality function: The unique solution to ( [22] ) is given by the truncation operator 



\ Vii, 


K\ 


(0, 


\Vij\ 



(24) 



where b = a/27/ p; see Fig. 4b for an illustration. For given Vij, G* is set to Vij if \Vij\ > a/27/ p 
and to zero if | V^ \ < \J2^/ p. 

3) Sum-of-logs function: As shown in Appendix [Cj the solution to (22) is given by 



0, A < 

0, A > and r+ < 

r + , A > and r" < and < r + < 1 

G°, A > and < r ± < 1 



(25) 



where 



A 

,,-± 



(l^|+^) 2 -4(7/p) 



(26) 
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0.015 
0.01 
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-0.005 
-0.01 



-a 



G. 



a 



(a) a = ( 7 /p)W« 



0.02 -0.01 0.01 0.02 

(c) 7 = 0.1 



-b 
/ 



b Vi i 



(b) b = ^f P 



-0.1 0.1 0.2 



(d) 7 = 1 



-1 -0.5 0.5 1 

(e) 7 = 10 



Fig. 4: (a) The soft thresholding operator (23); (b) the truncation operator (24). The slope of the 
lines in both (a) and (b) is equal to one. (c)-(e) The operator (25) with {p = 100, e = 0.1} for 
different values of 7. For 7 
for 7 



0.1, (25) resembles the soft thresholding operator (23) in Fig. 4a 



10, it resembles the truncation operator (24) in Fig. 4b, for 7 = 
the difference between the soft thresholding and truncation operators. 



1, operator (25) bridges 



and G° : = argmin {4>ij(r + Vij), 0^(0)}. For fixed p and e, (25) is determined by the value of 7; 



see Figs. |4qJ4e| For small 7, ( [25] ) resembles the soft thresholding operator (cf. Figs. [4c] and |4a|) 
and for large 7, it resembles the truncation operator (cf. Figs. 4e and 4b). In other words, (25) 
can be viewed as an intermediate step between the soft thresholding and the truncation operators. 
Remark 3: In block sparse design, g is determined by j J2i j Wij \\Gij J2i j card (HC^Hf); 
+ HCy H^/ 5 ) j ) an d the minimizers of (22) are obtained by replacing the absolute 
value of Vij in (23), (24), and (26), respectively, with the Frobenius norm || ■ \\ F of the corre- 
sponding block submatrix V^. 



B. Anderson-Moore method for the F -minimization problem (21a) 



We next employ the Anderson-Moore method to solve the F-minimization problem (21a). 
The advantage of this algorithm lies in its fast convergence (compared to the gradient method) 
and in its simple implementation (compared to Newton's method); e.g., see [16], [48], [49]. 



When applied to the F-minimization problem (21a), this method requires the solutions of two 
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Lyapunov equations and one Sylvester equation in each iteration. We next recall the first and 
second order derivatives of J; for related developments, see [49]. 
Proposition 3: The gradient of J is determined by 



VJ(F) = 2(RF - B*P)L 
where L and P are the controllability and observability Gramians of the closed-loop system, 



(A - B 2 F) L + L(A- B 2 F) T = - B X B\ 

{A - B 2 F) T P + P(A- B 2 F) = - (Q + F T RF). 



(NC-L) 
(NC-P) 



The second-order approximation of J is determined by 



J(F + F) « J(F) + (VJ(F),F) + (1/2) (H(F,F),F) 



where H(F, F) is the linear function of F, 



H(F, F) = 2 ( (RF - BjP) L + (RF - B^P) L 



and L, P are the solutions of the following Lyapunov equations 

(A - B 2 F) L + L(A- B 2 F) T = B 2 FL + (B 2 FL) T 

(A - B 2 F) T P + P(A- B 2 F) = (PB 2 - F T R) F + F T (BjP - RF). 
By completing the squares with respect to F in the augmented Lagrangian C p , we obtain the 



following equivalent problem to (21a) 



minimize <p(F) = J(F) + (p/2)\\F - U k \\ 2 F 
where U k = G k — (l/p)A k . Setting := VJ + p(F — U k ) to zero yields the necessary 



conditions for optimality 



2 (RF - BTp) L + p(F - U k ) =0 



(NC-F) 



where L and P are determined by (NC-L) and (NC-P) 



Starting with a stabilizing feedback F, the Anderson-Moore method solves the two Lyapunov 
equations (NC-L)| and (NC-P)[ and then solves the Sylvester equation (NC-F) to obtain a new 



feedback gain F. In other words, it alternates between solving (NC-L) and (NC-P) for L and P 
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with F being fixed and solving (NC-F) for F with L and P being fixed. It can be shown that the 
difference between two consecutive steps F = F — F forms a descent direction of ip; see [16] 
for a related result. Thus, standard line search [50] can be employed to determine step-size s in 
F + sF to guarantee the convergence to a stationary point of <p. 

Remark 4 ( Closed-loop stability): Since the % 2 norm is well defined for causal, strictly proper, 
stable closed-loop systems, we set J to infinity if A — B 2 F is not Hurwitz. Furthermore, J is a 
smooth function that increases to infinity as one approaches the boundary of the set of stabilizing 
gains [16]. Thus, the decreasing sequence of {tp(F 1 )} ensures that {F 1 } are stabilizing gains. 



Remark 5: For the design of single-integrator networks (17), we use Newton's method to 



solve the F-minimization problem and provide the gradient and Hessian of J in Appendix |Bj 

C. Solving the structured H2 problem: Polishing step 

We next turn to the I-L2 problem subject to structural constraints on the feedback matrix. Here, 



we fix the sparsity patterns F E S identified using ADMM and then solve (SH2) to obtain the 
optimal feedback gain that belongs to the subspace S. This polishing step, which is commonly 
used in optimization [27, Section 6.3.2], can improve the performance of sparse feedback gains 
resulting from the ADMM algorithm. 

As noted in Remark [4j the sparse feedback gains obtained in the ADMM algorithm are 
stabilizing. This feature of ADMM facilitates the use of Newton's method to solve (SH2)| Given 



an initial gain F° E S, a decreasing sequence of the objective function {J(F 1 )} is generated by 
updating F according to F l+1 = F 1 + s % F % \ here, s l is the step-size and F l E S is the Newton 
direction that is determined by the minimizer of the second-order approximation of the objective 
function Equivalently, F l E S is the minimizer of 

:= (1/2) (H(F) o I s , F) + (VJo/ 5)J P) 

where structural identity 1$ of subspace S (under entry-wise multiplication o of two matrices) 
is used to characterize structural constraints 

{1, if Fij is a free variable 
=> F o I s = F for F E S. 
0, if F^ = is required 

To compute Newton direction, we use the conjugate gradient (CG) method that does not require 
forming or inverting the large Hessian matrix explicitly; see [50, Chapter 5]. It is noteworthy that 
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techniques such as the negative curvature test [50, Section 7.1] can be employed to guarantee 
the descent property of Newton direction; consequently, line search methods, such as Armijo 
rule [50, Section 3.1], can be used to generate a decreasing sequence of J. 



D. Convergence of ADMM 

For convex problems, e.g., the design of single-integrator networks with the weighted l\ norm 
as the sparsity-promoting penalty function, the convergence of ADMM to the global minimizer 
follows from standard results [19]. For nonconvex problems, where convergence results are not 
available, extensive computational experience suggests that ADMM works well when the value 
of p is sufficiently large [51], [52]. This is attributed to the quadratic term (p/2)\\F — G\\p that 
tends to locally convexify the objective function for sufficiently large p; see [53, Chapter 14.5]. 



We next examine convergence of the ADMM algorithm for problem (SP) with g determined 



by the weighted l\ norm (10). In this case, we show that when ADMM converges it converges 
to a critical point of (SP) For a convergent point (F*,G*,A*) of the sequence {F k ,G k , A k }, 



(21c) simplifies to 

F* - G* = 0. 

Since F* minimizes C P (F, G*,A*) and since G* minimizes C P (F*, G, A*), we have 

= VJ(F*) + A* 
e idg(G*) - A" 



where dg is the subdifferential of the convex function g in ( 10). Therefore, (F*, G*) satisfies the 



necessary conditions for the optimality of (SP) and ADMM converges to a critical point of (SP) 



V. Examples 



We next use three examples to illustrate the utility of the approach developed in Section IV 
The identified sparsity structures result in localized controllers in all three cases. Additional 
information about these examples, along with MATLAB source codes, can be found at 

www. ece . u m n . ed u/~ m i h ai I o/sof tware/lq rs p/ 

A. Mass-spring system 
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^mM~yt^~]^NMr • • • -MiyA hAwM K/wwir| 
Fig. 5: Mass-spring system. 



For a mass-spring system with iV masses shown in Fig. [5} let pj be the displacement of 
the zth mass from its reference position and let the state variables be X\ := [pi ■ ■ ■ Pn] T and 
x 2 '■= [Pi ■ ■ ■ Pn] T ■ For simplicity we consider unit masses and spring constant; note that our 
method can be used to design controllers for arbitrary values of these parameters. The state-space 
representation is then given by ([TJ) with 



.4 



O I 
T O 



Bi 



Bo 



O 
I 



where T is an iV x iV symmetric tridiagonal matrix with —2 on its main diagonal and 1 on 
its first sub- and super-diagonal, and / and O are N x iV identity and zero matrices. The state 
performance weight Q is the identity matrix and the control performance weight is R = 101. 
We use the weighted l x norm as the sparsity-promoting penalty function, where we follow [22] 



and set the weights to be inversely proportional to the magnitude of the solution F* of (SP) 
at the previous value of 7, 



Wi. 



1/(1*51 +e 



(27) 



This places larger relative weight on smaller feedback gains and they are more likely to be 
dropped in the sparsity-promoting algorithm. Here, e = 1CT 3 is introduced to have well-defined 
weights when F* = 0. 

The optimal feedback gain at 7 = is computed from the solution of the algebraic Riccati 
equation ([8]). As 7 increases, the number of nonzero sub- and super-diagonals of both position 
F* and velocity F* gains decreases; see Fig. [6j Eventually, both F* and F* become diagonal 
matrices. It is noteworthy that diagonals of both position and velocity feedback gains are nearly 
constant except for masses that are close to the boundary; see Fig. |7j 

After sparsity structures of controllers are identified by solving |(SP)[ we fix sparsity patterns 



and solve structured "H 2 problem (SH2) to obtain the optimal structured controllers. Comparing 
the sparsity level and the performance of these controllers to those of the centralized controller 
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°[= 1 1 isc — — i 




50 r,o i . •» 

2(1 -10 60 80 100 20 -10 lil) 80 100 

(a) 7 = 10" 4 (b) 7 = 0.0105 

Fig. 6: Sparsity patterns of F* = [F* F*] G ]R 50x100 for the mass-spring system obtained 
using weighted l\ norm to promote sparsity. As 7 increases, the number of nonzero sub- and 
super-diagonals of F* and F* decreases. 



0.48 

o.jb-5™™ 



(a) 



(b) 



Fig. 7: (a) The diagonal of F* and (b) the diagonal of F* for different values of 7: 10 -4 (o), 
0.0281 (+), and 0.1 (*). The diagonals of the centralized position and velocity gains are almost 
identical to (o) for 7 = 10~ 4 . 



F c , we see that using only a. fraction of nonzero elements, the sparse feedback gain F* achieves 
I-L2 performance comparable to the performance of F c ; see Fig. [8] In particular, using about 2% 
of nonzero elements, "H 2 performance of F* is only about 8% worse than that of F c ; see Tabled 

7 0.01 0.04 0.10 

card (F*)/card (F c ) 9.4% 5.8% 2.0% 
(J(F*) - J{F C ))/J{F C ) 0.8% 2.3% 7.8% 

TABLE I: Sparsity vs. performance for mass-spring system. Using 2% of nonzero elements, H2 
performance of F* is only 7.8% worse than performance of the centralized gain F c . 
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(a) card (F*)/card (F c ) (b) (J(F 




Fig. 8: (a) The sparsity level and (b) the performance loss of F* compared to the centralized 
gain F c . 



B. Spatially distributed example 

Let N = 100 nodes be randomly distributed with a uniform distribution in a square region of 
10 x 10 units. Each node is a linear system coupled with other nodes through the dynamics [8] 

JV 

±i = }}A]ijXj + [Bi}ndi + [B 2 ]uUi 
i = i 

for i = 1, . . . , N, where [ • ]^ denotes the ijth block of a matrix and 



1 1 
1 2 



[B 



i \a 



2 n 





1 



, [A]i 



1 
1 



for i 7^ j. 



The coupling between two systems i and j is determined by the Euclidean distance a(i,j) 
between them. The performance weights Q and R are set to identity matrices. 



We use the weighted l x norm as the penalty function with the weights given by (27). As 7 in- 
creases, the communication architecture of distributed controllers becomes sparser. Furthermore, 
the underlying communication graphs gradually attain a localized communication architecture; 
see Fig. |9} Note that, using about 8% of nonzero elements of F c , H2 performance of F* is only 
about 28% worse than performance of the centralized gain F c ; see Table |n| Figure 10 shows 
the optimal trade-off curve between the "H 2 performance and the feedback gain sparsity. 

We note that the truncation of the centralized controller could result in a non- stabilizing 
feedback matrix [8]. In contrast, our approach gradually modifies the feedback gain and increases 
the number of zero elements, which plays an important role in preserving the closed-loop stability. 
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4 

(C) 7 : 



Fig. 9: The localized communication graphs of distributed controllers obtained by solving (SP) 



for different values of 7. The communication structure becomes sparser as 7 increases. Note that 
the communication graph does not have to be connected since the subsystems are (i) dynamically 
coupled to each other and (ii) allowed to measure their own states. 



7 12.6 26.8 68.7 

card (F*) / card (F c ) 8.3% 4.9% 2.4% 
(J(F*) - J(F C ))/J(F C ) 27.8% 43.3% 55.6% 

TABLE II: Sparsity vs. performance for the spatially distributed example. Using 8.3% of nonzero 
elements, % 2 performance of F* is only 27.8% worse than performance of the centralized gain 



60%,— 
* 

50% + 
+ 

40% + 



Ct| 30%- 



20%- 
10%- 



20% 40% "M ' 80% W0% 

card (F*) /card (F c ) 



Fig. 10: The optimal trade-off curve between the "H 2 performance loss and the sparsity level of 
F* compared to the centralized gain F c for the spatially distributed example. 



July 26, 2012 



Submitted to IEEE Trans. Automat. Control 



25 



C. Block sparsity: An example from bio-chemical reaction 

Consider a network of N = 5 systems coupled through the following dynamics 



1 N 



I- J) {Xi 



x,) + [B^udi + [B 2 ] 



3 = 1 



where [-j^ denotes the r/'th block of a matrix and 



-1 
3 





-1 

3 



-3 

-1 



[B 



l\ii 



3 
3 
3 



2 m 



3 





The performance weights Q and R are set to identity matrices. Systems of this form arise in 
bio-chemical reactions with a cyclic negative feedback [54]. 

We use the weighted sum of Frobenius norms as the sparsity-promoting penalty function and 
we set the weights to be inversely proportional to the Frobenius norm of the solution F*- 
(SP) at the previous value of 7, i.e., Wy = 1/(\\F*A\ F + e) with e = 10~ 3 . As 7 increases, the 



to 



number of nonzero blocks in the feedback gain F decreases. Figure 1 1 shows sparsity patterns 



of feedback gains resulting from solving |(SP) with sparse and block sparse penalty functions. 
Setting 7 to values that yield the same number of nonzero elements in these feedback gains 
results in the block sparse feedback gain with a smaller number of nonzero blocks. In particular, 



the first two rows of the block sparse feedback gain in Fig. 11a are identically equal to zero 
(indicated by blank space). This means that the subsystems 1 and 2 do not need to be actuated. 
Thus, the communication graph determined by the block sparse feedback gain has fewer links; 



cf. Figs. 12a and 12b 



VI. Concluding Remarks 

We design sparse and block sparse state feedback gains that optimize the I-L2 performance 
of distributed systems. The design procedure consists of a structure identification step and 
a polishing step. In the identification step, we employ the ADMM algorithm to solve the 
sparsity-promoting optimal control problem, whose solution gradually moves from the centralized 
gain to the sparse gain of interest as our emphasis on the sparsity-promoting penalty term is 
increased. In the polishing step, we use Newton's method in conjunction with conjugate gradient 
scheme to solve the minimum variance problem subject to the identified sparsity constraints. We 
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Fig. 11: The sparse feedback gains obtained by solving |(SP)| (a) using the weighted sum of 
Frobenius norms with 7 = 3.6 and (b) using the weighted t\ norm (TTOl) with 7 = 1.3. Here, 



F e 



>5xl5 



is partitioned into 25 blocks Fij G 



>lx3 



Both feedback gains have the same number 



of nonzero elements (indicated by dots) and close I-L2 performance (less than 1% difference), 
but different number of nonzero blocks (indicated by boxes). 





(a) 



(b) 



Fig. 12: Communication graphs of (a) the block sparse feedback gain in Fig. 11a and (b) the 



sparse feedback gain in Fig. |1 lb| (red color highlights the additional links). An arrow pointing 
from node i to node j indicates that node % uses state measurement from node j. 



also establish convexity for a class of feedback synthesis problems that arise in the design of 
sparse undirected networks. This is accomplished by providing an SDP formulation for design 
problems ranging from average consensus, to cooperative control of vehicular formations, to 
synchronization of oscillator networks. 

Although we focus on the I-L2 performance, the developed framework can be extended to 
design problems with other performance indices. We emphasize that the analytical solutions to 
the G-minimization problem are independent of the assigned performance index. Consequently, 
the G-minimization step in ADMM for (SP)| with alternative performance indices can be done 



exactly as in Section IV-A Thus, ADMM provides a flexible framework for sparsity-promoting 



optimal control problems of the form (SP) 



We have recently employed ADMM for selection of an a priori specified number of leaders 
in order to minimize the variance of stochastically forced dynamic networks [42], for adding 
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social links to maximize public awareness in social networks [44], and for identifying sparse 
representations of consensus networks [55]. We also aim to extend the developed framework to 
the observer-based sparse optimal feedback design. Another topic of interest is the development 
of distributed schemes for the F-minimization step in the ADMM algorithm. Recently developed 
tools from distributed optimization [56] may play an important role in this effort. 
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Appendix 

A. Proof of Lemma [7] 

From the LMI in |l6]), it follows that F := F + 1 l T /iV y 0. We next show that F is positive 
definite, F y 0. Using the generalized Schur complement [27, Appendix A. 5. 5], we have 

(J - FF ] )Q 1/2 = 0, (28) 

where F^ is the Moore-Penrose pseudoinverse of F. Consider the spectral decomposition Q = UAU 1 
where U = V] is the orthonormal matrix and A = diag (A) with A = [0 A2 ••• \n] 



and Aj > for i — 2, . . . ,N. Then multiplying U T from the left and U from the right to (28) 
yields 

U T {I - FF r )UA 1/2 = 0. 

It follows that the symmetric matrix U T (I — FF^)U is a diagonal matrix with its diagonal equal 
to from the 2nd to the A^th entry, i.e., 

U T (I -FF ] )U = diag ([a ••• 0]), 

and thus, 

FF ] = I-(a/N)tt T , 
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where the scalar a is to be determined. We note that a ^ 1, because otherwise FF^ — I — 
11 T /N implies that the range space of F is orthogonal to 1 (i.e., Ft = 0), which leads to the 
contradiction 

= Ft = (F + tt T /N)t = 1. 
Since /— (a/N)tt T is not invertible for any a ^ 1, we conclude that F is of full rank. Therefore, 



F y and FF^ = I; thus, a = 0. Then the equivalence between ( 15 ) and (16) can be established 
by noting that 

X Q 1 ' 2 



Q 



1/2 



y o 



x y q 1/2 f- 1 q 1 



/2 



whenever F y 0. To minimize the objective function of (16) for F y 0, we simply take 



X = Q l / 2 F 1 Q 1 / 2 , which yields the objective function J in (15). This completes the proof. 



B. Formulae for the gradient and Hessian of J in (13) and (14) 



We begin by writing the symmetric gain F in (14) using the incidence matrix 



EK r E 



5> 



1,3 



where K r is a diagonal matrix with its diagonal determined by k r — [k±2 • • • fenv-iw] £ K m 
with m = N(N — l)/2, and E is the incidence matrix of a complete graph [15], [57], defined 



as E = [e i2 • • • e(Ar_i )A r] G 



^ xm . Here, ey G 



5 AT 



takes 1 and —1 at the ith and jth entries, 



respectively, and otherwise, and i goes from 1 to N and j goes from i + 1 to iV for a fixed i. 



The gradient and Hessian of J in (14) with respect to k r are given by 

V fcr J = - {I / 2) dmg {E T {F + tt T /NY 1 Q{F + tt T /N)-^ - E T RE) 
V£J = (E T (F + ll T /A^)- 1 g(F + ll T /^) _1 J B)o(7i; T (F + ll T /iV) _1 J B). 

The detailed derivation is omitted for brevity; see [57] for related results. For the objective 



function (13), the symmetric gain F can be written as 

F = EK r E T + K a = [E I 



K r 




' E T ' 


K a 




I 
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where K a is a diagonal matrix with its diagonal determined by k a — [ kn ■ ■ ■ k NN ] G M. N . Let 
E := [E I] G f( JV+m ' XJV and let k := [kj k?} T G R N+m . Then the gradient and Hessian of J 



in (13) with respect to k are given by 



V K J = - (1/2) diag (E T F~ 1 QF~ 1 E - E T RE) 
V 2 K J = (E T F- l QF- 1 E) o (E T F' 1 E). 



C. Derivation of (|25|)-(|26|) 

The first step is to show that the minimizer of 0^ can be written as G*- = rV^ where r G [0, 1]. 
This can be seen by noting that for > 0, we have G [0, Vy] since both the logarithmic 
function log (1 + \Gij\/e) and the quadratic function (G^j — Vij) 2 are monotonically increasing 
for Gij > and monotonically decreasing for Gy < 0. A similar arguments shows that G*j 
belongs to [Vy,0] for < 0. Thus, minimizing (pij(Gij) is equivalent to minimizing 



subject to the constraint < r < 1. Thus, we have converted a nondifferentiable unconstrained 
problem to a differentiable but constrained one. 

We will now examine the sign of d<pij/dr for r G [0, 1]. Setting dcpij/dr = yields a quadratic 
equation (for ^ 0) 

\V t] \r 2 + {e- \Vij\y + — r — -e = 0. (29) 

If the discriminant A < 0, then d&j/dr > and is monotonically nondecreasing for 
r G [0, 1]; thus, the minimizer is r* = 0. Let us assume that A > and let be the solutions 



to (29). It is then readily verified that 

1 



Then the sign of dcj)ij/dr for r G [0, 1] can be determined by checking the values of r ± . 

1) If < 0, then d<pij/dr > for r G [0, 1]; thus, the minimizer is r* = 0. 

2) If r~ < and < r + < 1, then d(f>ij/dr < for r G [0, r + ] and d<pij/dr > for 
r G (r + , 1]; thus, the minimizer is r* = r + . 



July 26, 2012 



Submitted to IEEE Trans. Automat. Control 



30 



3) Finally, if < r ± < 1, then d<pij/dr > for r e [0,r~), d<pij/dr < for r e [r~,r + ), 
and d(pij/dr > for r e [r~, 1]. Therefore, r~ is a local maximizer and r + is a local 
minimizes Thus, the candidates for r* are {0,r + }. 
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